Data Visualization#

Once upon a time there were plots upon plots upon plots.

Load data#

Hide code cell source
import pandas as pd
import sys
sys.path.append('../')
from source.bokeh_plots import *
from source.data_visualization import *
output_notebook()

file_path = '../data/alma_main_results.xlsx'
model_name = 'AML Epigenomic Risk'

# Read the data
df = pd.read_excel(file_path, index_col=0).sort_index()

# Define train and test samples
df_train = df[df['Train-Test']=='Train Sample']
df_test = df[df['Train-Test'] == 'Test Sample']

# Drop the samples with missing labels for the selected column
df_px = df_train[~df_train['Vital Status'].isna()]
df_px = df_px[~df_px['Batch'].isin(['GDC_TARGET-ALL'])]

# drop the samples with missing labels for the ELN AML 2022 Diagnosis
df_dx = df_train[~df_train['WHO 2022 Diagnosis'].isna()]

# exclude the classes with fewer than 10 samples
df_dx = df_dx[~df_dx['WHO 2022 Diagnosis'].isin([
                                       'MPAL with t(v;11q23.3)/KMT2A-r',
                                       'B-ALL with hypodiploidy',
                                       'AML with t(16;21); FUS::ERG',
                                       'AML with t(9;22); BCR::ABL1'
                                       ])]

### Select samples from COG AAML1031, 0531, and 03P1 Dx samples
df_cog = df[df['Clinical Trial'].isin(['AAML0531', 'AAML1031', 'AAML03P1'])]
df_cog = df_cog[df_cog['Sample Type'].isin(['Diagnosis', 'Primary Blood Derived Cancer - Bone Marrow',
                                            'Primary Blood Derived Cancer - Peripheral Blood'])]
df_cog = df_cog[~df_cog['Patient_ID'].duplicated(keep='last')]
Loading BokehJS ...

Interactive atlas#

Hide code cell source
plot_linked_scatters(df)

Patient Characteristics#

Foundation (unsupervised) model#

Hide code cell source
from tableone import TableOne
from datetime import date

columns = ['Hematopoietic Entity','Age (group years)','Sex',
            'Clinical Trial',]

mytable_cog = TableOne(df_train.reset_index(), columns,
                        overall=False, missing=False,
                        pval=False, pval_adjust=False,
                        htest_name=True,dip_test=True,
                        tukey_test=True, normal_test=True,

                        order={'FLT3 ITD':['Yes','No'],
                                'Age (group years)':['0-5','5-13','13-39','39-60'],
                                'MRD 1 Status': ['Positive'],
                                'Risk Group': ['High Risk', 'Standard Risk'],
                                'FLT3 ITD': ['Yes'],
                                'Leucocyte counts (10⁹/L)': ['≥30'],
                                'Age group (years)': ['≥10']})

mytable_cog.to_excel('../data/pt_characteristics_alma_model_' + str(date.today()) +'.xlsx')

mytable_cog.tabulate(tablefmt="html", 
                        # headers=[score_name,"",'Missing','Discovery','Validation','p-value','Statistical Test']
                        )
Hide code cell output
Overall
n 3308
Hematopoietic Entity, n (%)Acute lymphoblastic leukemia (ALL) 700 (28.4)
Acute myeloid leukemia (AML) 1207 (49.0)
Acute promyelocytic leukemia (APL) 31 (1.3)
Mixed phenotype acute leukemia (MPAL) 50 (2.0)
Myelodysplastic syndrome (MDS or MDS-like)225 (9.1)
Otherwise-Normal (Control) 251 (10.2)
Age (group years), n (%) 0-5 480 (24.1)
5-13 482 (24.2)
13-39 658 (33.1)
39-60 165 (8.3)
60+ 203 (10.2)
Sex, n (%) Female 883 (49.1)
Male 914 (50.9)
Clinical Trial, n (%) AAML03P1 72 (2.2)
AAML0531 628 (19.2)
AAML1031 581 (17.8)
Beat AML Consortium 316 (9.7)
CCG2961 41 (1.3)
CETLAM SMD-09 (MDS-tAML) 166 (5.1)
French GRAALL 2003–2005 141 (4.3)
Japanese AML05 64 (2.0)
NOPHO ALL92-2000 933 (28.6)
TARGET ALL 131 (4.0)
TCGA AML 194 (5.9)

Fine-tuned (supervised) models#

Hide code cell source
columns = ['Age (years)','Age group (years)','Sex','Race or ethnic group',
            'Hispanic or Latino ethnic group', 'MRD 1 Status',
            'Leucocyte counts (10⁹/L)', 'BM leukemic blasts (%)',
            'Risk Group','FLT3 ITD', 'Clinical Trial']

df_test['Age (years)'] = df_test['Age (years)'].astype(float)

# join discovery clinical data with validation clinical data
all_cohorts = pd.concat([df_dx, df_px, df_test],
                         axis=0, keys=['AL Epigenomic Subtype','AML Epigenomic Risk' ,'Validation'],
                         names=['cohort']).reset_index()

# columns = ['Age group (years)','Sex', 'MRD 1 Status',
#             'Leucocyte counts (10⁹/L)',
#             'Risk Group','FLT3 ITD', 'Treatment Arm','Clinical Trial']

mytable_cog = TableOne(all_cohorts, columns,
                        overall=False, missing=False,
                        pval=False, pval_adjust=False,
                        htest_name=True,dip_test=True,
                        tukey_test=True, normal_test=True,

                        order={'FLT3 ITD':['Yes','No'],
                                'Race or ethnic group':['White','Black or African American','Asian'],
                                'MRD 1 Status': ['Positive'],
                                'Risk Group': ['High Risk', 'Standard Risk'],
                                'FLT3 ITD': ['Yes'],
                                'Leucocyte counts (10⁹/L)': ['≥30'],
                                'Age group (years)': ['≥10']},
                                groupby='cohort')

mytable_cog.to_excel('../data/pt_characteristics_fine-tuned_models_' + str(date.today()) +'.xlsx')

mytable_cog.tabulate(tablefmt="html", 
                        # headers=[score_name,"",score_name,'Validation','p-value','Statistical Test']
)
Hide code cell output
AL Epigenomic Subtype AML Epigenomic Risk Validation
n 2445 1797 201
Age (years), mean (SD) 19.3 (19.8) 19.8 (21.6) 8.8 (6.0)
Age group (years), n (%) ≥10 520 (47.2) 644 (48.2) 95 (47.7)
<10 581 (52.8) 693 (51.8) 104 (52.3)
Sex, n (%) Female 702 (50.4) 853 (49.2) 87 (43.3)
Male 691 (49.6) 879 (50.8) 114 (56.7)
Race or ethnic group, n (%) White 1052 (80.4) 1302 (80.4) 143 (71.9)
Black or African American 131 (10.0) 155 (9.6) 32 (16.1)
Asian 65 (5.0) 87 (5.4) 1 (0.5)
American Indian or Alaska Native 7 (0.5) 8 (0.5)
Native Hawaiian or other Pacific Islander7 (0.5) 10 (0.6) 2 (1.0)
Other 46 (3.5) 57 (3.5) 21 (10.6)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 204 (19.3) 245 (19.0) 25 (12.6)
Not Hispanic or Latino 851 (80.7) 1044 (81.0) 174 (87.4)
MRD 1 Status, n (%) Positive 282 (29.7) 361 (31.4) 76 (40.2)
Negative 667 (70.3) 787 (68.6) 113 (59.8)
Leucocyte counts (10⁹/L), n (%) ≥30 572 (52.4) 646 (48.9) 88 (44.0)
<30 520 (47.6) 676 (51.1) 112 (56.0)
BM leukemic blasts (%), mean (SD) 65.8 (24.1) 65.1 (24.2) 60.0 (25.6)
Risk Group, n (%) High Risk 195 (14.1) 299 (17.5) 51 (25.4)
Standard Risk 620 (44.9) 849 (49.7) 87 (43.3)
Low Risk 566 (41.0) 561 (32.8) 63 (31.3)
FLT3 ITD, n (%) Yes 179 (16.3) 248 (18.6) 31 (15.6)
No 920 (83.7) 1087 (81.4) 168 (84.4)
Clinical Trial, n (%) AAML03P1 62 (2.6) 72 (4.1)
AAML0531 510 (21.2) 628 (35.8)
AAML1031 489 (20.3) 581 (33.1)
Beat AML Consortium 192 (8.0) 225 (12.8)
CCG2961 31 (1.3) 41 (2.3)
CETLAM SMD-09 (MDS-tAML) 166 (6.9)
French GRAALL 2003–2005 141 (5.9)
Japanese AML05 9 (0.4) 15 (0.9)
NOPHO ALL92-2000 636 (26.5)
TARGET ALL 50 (2.1)
TCGA AML 118 (4.9) 194 (11.0)
AML02 159 (79.1)
AML08 42 (20.9)

By prognostic group#

Discovery#

Hide code cell source
def pt_characteristics_by_model(df, model_name, traintest = 'discovery'):
        columns = ['Age (years)','Age group (years)','Sex','Race or ethnic group',
                'Hispanic or Latino ethnic group', 'MRD 1 Status',
                'Leucocyte counts (10⁹/L)', 'BM leukemic blasts (%)',
                'Risk Group', 'Clinical Trial','FLT3 ITD', 'Treatment Arm']

        mytable_cog = TableOne(df, columns,
                                overall=False, missing=False,
                                pval=True, pval_adjust=False,
                                htest_name=True,dip_test=True,
                                tukey_test=True, normal_test=True,

                                order={'FLT3 ITD':['Yes','No'],
                                        'Race or ethnic group':['White','Black or African American','Asian'],
                                        'MRD 1 Status': ['Positive'],
                                        'Risk Group': ['High Risk', 'Standard Risk'],
                                        'FLT3 ITD': ['Yes'],
                                        'Leucocyte counts (10⁹/L)': ['≥30'],
                                        'Age group (years)': ['≥10']},
                                groupby=model_name)

        mytable_cog.to_excel('../data/pt_characteristics_'+ model_name +'_' + traintest + '_' + str(date.today()) + '.xlsx')

        return(mytable_cog.tabulate(tablefmt="html", 
                                headers=[model_name + ' ' + traintest,"",'High','Low','p-value','Statistical Test']))

pt_characteristics_by_model(df_px, model_name, 'discovery')
Hide code cell output
AML Epigenomic Risk discovery High Low p-value Statistical Test
n 849 948
Age (years), mean (SD) 23.8 (24.7)16.0 (17.6)<0.001 Two Sample T-test
Age group (years), n (%) ≥10 303 (50.9) 341 (46.0) 0.080 Chi-squared
<10 292 (49.1) 401 (54.0)
Sex, n (%) Female 395 (47.6) 458 (50.8) 0.202 Chi-squared
Male 435 (52.4) 444 (49.2)
Race or ethnic group, n (%) White 636 (80.6) 666 (80.2) 0.404 Chi-squared (warning: expected count < 5)
Black or African American 81 (10.3) 74 (8.9)
Asian 42 (5.3) 45 (5.4)
American Indian or Alaska Native 4 (0.5) 4 (0.5)
Native Hawaiian or other Pacific Islander2 (0.3) 8 (1.0)
Other 24 (3.0) 33 (4.0)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 103 (18.0) 142 (19.8) 0.456 Chi-squared
Not Hispanic or Latino 469 (82.0) 575 (80.2)
MRD 1 Status, n (%) Positive 205 (41.4) 156 (23.9) <0.001 Chi-squared
Negative 290 (58.6) 497 (76.1)
Leucocyte counts (10⁹/L), n (%) ≥30 271 (46.2) 375 (51.0) 0.089 Chi-squared
<30 316 (53.8) 360 (49.0)
BM leukemic blasts (%), mean (SD) 66.8 (24.7)63.4 (23.7)0.005 Two Sample T-test
Risk Group, n (%) High Risk 213 (26.0) 86 (9.7) <0.001 Chi-squared
Standard Risk 511 (62.5) 338 (37.9)
Low Risk 94 (11.5) 467 (52.4)
Clinical Trial, n (%) AAML03P1 40 (4.7) 32 (3.5) <0.001 Chi-squared
AAML0531 267 (31.4) 361 (39.8)
AAML1031 251 (29.6) 330 (36.4)
Beat AML Consortium 127 (15.0) 98 (10.8)
CCG2961 29 (3.4) 12 (1.3)
Japanese AML05 8 (0.9) 7 (0.8)
TCGA AML 127 (15.0) 67 (7.4)
FLT3 ITD, n (%) Yes 131 (22.1) 117 (15.8) 0.004 Chi-squared
No 463 (77.9) 624 (84.2)
Treatment Arm, n (%) Arm A 132 (43.0) 178 (45.5) 0.555 Chi-squared
Arm B 175 (57.0) 213 (54.5)

Validation#

Hide code cell source
pt_characteristics_by_model(df_test, model_name, 'validation')
Hide code cell output
AML Epigenomic Risk validation High Low p-value Statistical Test
n 73 128
Age (years), mean (SD) 8.7 (6.3) 8.8 (5.8) 0.851 Two Sample T-test
Age group (years), n (%) ≥10 34 (47.2) 61 (48.0) 1.000 Chi-squared
<10 38 (52.8) 66 (52.0)
Sex, n (%) Female 32 (43.8) 55 (43.0) 1.000 Chi-squared
Male 41 (56.2) 73 (57.0)
Race or ethnic group, n (%) White 49 (69.0) 94 (73.4) 0.698 Chi-squared (warning: expected count < 5)
Black or African American 12 (16.9) 20 (15.6)
Asian 1 (1.4)
Native Hawaiian or other Pacific Islander1 (1.4) 1 (0.8)
Other 8 (11.3) 13 (10.2)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 12 (16.7) 13 (10.2) 0.275 Chi-squared
Not Hispanic or Latino 60 (83.3) 114 (89.8)
MRD 1 Status, n (%) Positive 34 (49.3) 42 (35.0) 0.076 Chi-squared
Negative 35 (50.7) 78 (65.0)
Leucocyte counts (10⁹/L), n (%) ≥30 31 (43.1) 57 (44.5) 0.957 Chi-squared
<30 41 (56.9) 71 (55.5)
BM leukemic blasts (%), mean (SD) 62.3 (27.7)58.8 (24.5)0.401 Two Sample T-test
Risk Group, n (%) High Risk 25 (34.2) 26 (20.3) <0.001 Chi-squared
Standard Risk 39 (53.4) 48 (37.5)
Low Risk 9 (12.3) 54 (42.2)
Clinical Trial, n (%) AML02 58 (79.5) 101 (78.9) 1.000 Chi-squared
AML08 15 (20.5) 27 (21.1)
FLT3 ITD, n (%) Yes 14 (19.4) 17 (13.4) 0.353 Chi-squared
No 58 (80.6) 110 (86.6)
Treatment Arm, n (%) Arm A 41 (57.7) 66 (51.6) 0.490 Chi-squared
Arm B 30 (42.3) 62 (48.4)

Kaplan-Meier Plots#

Overall study population#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], 
                          ['COG AML trials', 'Validation cohort']):
    draw_kaplan_meier(model_name=model_name,
                        df=dataset,
                        save_survival_table=False,
                        save_plot=False,
                        show_ci=False,
                        add_risk_counts=False,
                        trialname=trial,
                        figsize=(8,8))
Hide code cell output
../_images/b5f8e939133133ca6a378d53e3f06628a4cdcfe9fb52082830f956f365e84f5c.png ../_images/06b5ca8feb951d4e26c0ee9474120315558a97d2fc63757718d4b6a38fa0a574.png

Per risk group#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):

    risk_groups = ['High Risk', 'Low Risk', 'Standard Risk']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name=model_name,
            df=dataset[dataset['Risk Group'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group}',
            figsize=(8, 8))
Hide code cell output
../_images/f34904641067177b4b52947afd56cb228278fbd6858fce53a7c0bf0c5bd20ae4.png ../_images/a406ea3c237c0b63895c1d48301eb70f85aa88975363ccb799f03cfe0f01df33.png ../_images/9809b585c5995e17e4f0f59ee80284e42196dba8a65c47b8f5bc7e4b9592a7a0.png ../_images/6136f84538c06cd97dc0b930940ef8b1480172df40132c33d48bd6d130305d99.png ../_images/6bc3855f83c31527ede890c7cdda0592c3dcecd0dbc593a6ccadfe512f350f25.png ../_images/bc286e371eea02b657d5a691d846a9d6fb9041138e20b2d4cb840b0e897bb558.png

Per risk group (AAML1831 COG)#

Hide code cell source
for dataset, trial in zip([df_cog],['COG AML trials']):

    risk_groups = ['High', 'Low', 'Standard']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name=model_name,
            df=dataset[dataset['Risk Group AAML1831'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group} Risk',
            figsize=(8, 8))
Hide code cell output
../_images/16710d35472627b9f21cb558bf15b4669668244c29443969a0533a4e1c4d69d9.png ../_images/fa5434b48919f5fc4c29033fef74a9bb5d7fbb0b1755c8cd551460a021ca4ec3.png ../_images/9a90dff27afd39411af134711c2230ef7f5cab76d1f562ff73db72dc3294273b.png

Forest Plots#

With MRD 1#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):
    
    df_ = dataset.copy()
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk'] 

    draw_forest_plot(time='os.time',
                        event='os.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot(time='efs.time',
                        event='efs.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/c941ff92028b88a554149f4c3711021253563a1c63ff1d6c6178d8ab8dde730b.png ../_images/0cff35550e282cb8afd4fc86eb16a155ab3516753a4f6d1792869ade792022fb.png ../_images/6e3a91eae08264c13c356d02c493288c16e7c8399f01cf4ec041f78e660a033b.png ../_images/eee045a343c9fdd7c424f24b4646f64f7ee8e55e76147b1cd2e494b7a46a858b.png

With MRD 1 and BM blast (%)#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):
    
    df_ = dataset.copy()
    df_['BM leukemic blasts (%)'] = pd.cut(df_['BM leukemic blasts (%)'], bins=[0,50,100], labels=['≤50', '>50'])
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk'] 

    draw_forest_plot_withBMblast(time='os.time',
                        event='os.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot_withBMblast(time='efs.time',
                        event='efs.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/613d952b6b1945dc98d65faf64fa52a847a704eb985325b43ef2e96f4925ebbf.png ../_images/6cd1896bba646b2f28078cf575f6129f619347101045ec4d31ca6acb84cea4de.png ../_images/d2f6d75ac7f75e5110ef6aa5e37bcfb9bc58bef769052d837fb465214eaa4b24.png ../_images/4e6ad3ac41a5767ed4bb6946c6457214ab9670532bbd17c380279d5c9c128ebd.png

Without MRD 1#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):
    
    df_ = dataset.copy()
    df_['BM leukemic blasts (%)'] = pd.cut(df_['BM leukemic blasts (%)'], bins=[0,50,100], labels=['≤50', '>50'])
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk'] 

    draw_forest_plot_noMRD(time='os.time',
                        event='os.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot_noMRD(time='efs.time',
                        event='efs.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/b603d61df623b3bf5083f3fd2510223c470f0ab9278918505ff0e40b82acb463.png ../_images/5efc6fb932153b431fe7b02089d437a4e9ddc51cfdced8621b4c9acd59d0398e.png ../_images/c63675dcc022d831721cdceabf05ee62327c7ef665a00ef9f4c24b762a6ede0e.png ../_images/0042b42d0c60fabf65b9258183337ecc94797f5000536e1bda33ae5fd5e26e59.png

ROC AUC performance#

AL epigenomic phenotype#

Hide code cell source
df_dx_auc_train, df_dx_dummies_train = process_dataset_for_multiclass_auc(df_dx)
df_dx_auc_cog, df_dx_dummies_cog = process_dataset_for_multiclass_auc(df_cog)
df_dx_auc_test, df_dx_dummies_test = process_dataset_for_multiclass_auc(df_test)
                                                                        
p1 = plot_multiclass_roc_auc(df_dx_auc_train, df_dx_dummies_train.columns, title='Discovery cohort')
p2 = plot_multiclass_roc_auc(df_dx_auc_cog, df_dx_dummies_cog.columns, title='Discovery COG peds AML Dx')
p3 = plot_multiclass_roc_auc(df_dx_auc_test, df_dx_dummies_test.columns, title='Validation cohort')

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    ], toolbar_location='above')

show(p)
Hide code cell output

AML epigenomic risk (probability) + risk group#

Hide code cell source
# Probability model
model_name = 'P(Death)'
p1 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , title='Discovery cohort')
p2 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, title='Discovery COG peds AML Dx')
p3 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, title='Validation cohort')


p4 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , sum_models=True)
p5 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, sum_models=True)
p6 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, sum_models=True)

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    [p4, p5, p6,],
    ], toolbar_location='above')

show(p)
Hide code cell output

Note

Sample size may be reduced in the ROC AUC because samples with missing risk group data were removed.

AML epigenomic risk (high-low) + risk group#

Hide code cell source
# Binary model
model_name = 'AML Epigenomic Risk'
p1 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , title='Discovery cohort')
p2 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, title='Discovery COG peds AML Dx')
p3 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, title='Validation cohort')


p4 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , sum_models=True)
p5 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, sum_models=True)
p6 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, sum_models=True)

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    [p4, p5, p6,],
    ], toolbar_location='above')

show(p)
Hide code cell output

AML epigenomic risk + latest risk group (AAML1831 COG)#

Hide code cell source
# Probability model
model_name = 'P(Death)'
p1 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name ,risk_group='Risk Group' ,title='Risk group AAML1031-0531')
p2 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831' ,title='Risk group AAML1831')
p3 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831', sum_models=True, title='Risk group AAML1831 + Epigenomic Risk')

# Binary model
model_name = 'AML Epigenomic Risk'
p4 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name ,risk_group='Risk Group')
p5 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831')
p6 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831', sum_models=True)

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    [p4, p5, p6,],
    ], toolbar_location='above')

show(p)
Hide code cell output

Box Plots#

Hide code cell source
draw_boxplot(df=df_test,x='Risk Group', y='P(Death)',
                order=['High Risk', 'Standard Risk', 'Low Risk'],
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,4))

draw_boxplot(df=df_test,x='MRD 1 Status', y='P(Death)',
                order=['Positive','Negative'],
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,4))

draw_boxplot(df=df_test,x='Primary Cytogenetic Code', y='P(Death)',
                order='auto',
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,4))
Hide code cell output
../_images/9488c355179959b896cc83c0e5fd4874702f98469c343017c47ab1701a3a9621.png ../_images/6b019a457d723c295a5c249693c89e0723984c982cb3d1f09c1936a1a7ff21fe.png ../_images/03cfcb02ed37c8e65131eb6704edd864c59b2b4c6fd5807ef4d067bce3e4214e.png

Stacked Bar Plots#

Hide code cell source
model_name = 'AML Epigenomic Risk'
draw_stacked_barplot(df=df_test,x='MRD 1 Status', y=model_name,
             order=['Positive','Negative'],
             trialname='StJude trials', hue=model_name,
             save_plot=False, figsize=(4,3))

draw_stacked_barplot(df=df_test,x='Risk Group', y=model_name,
                order=['High Risk', 'Standard Risk', 'Low Risk'],
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,3), fontsize=9)

draw_stacked_barplot(df=df_test,x='Primary Cytogenetic Code', y=model_name,
                order='auto',
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,3), fontsize=6)
Hide code cell output
../_images/36496b43857dda986c883a55b52ea12d9a128199964d39d2f792999b4ded756f.png ../_images/fb0ae7350d7688cf2041b1bf76ae3778479ed5f517dd688dd95b2f7dbc208172.png ../_images/4473d7e5b2efacf6a887444f69c4a8e969bf3e72d97632a927f6b9211d4c1636.png

Sankey plots#

Note

Sankey plots below compare the distribution of categories. The width of the lines is proportional to the number of patients in each group.

Samples with annotated diagnosis info#

Hide code cell source
colors = get_custom_color_palette()


draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title='Discovery cohort', fig_size=(4, 11),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_cog, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Validation cohort',fig_size=(3, 7),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/e0ec20293655c29f8feaa727d6d572ed0b131d32a0aecf14e424468ab100ede7.png ../_images/2acbea9d8922eac6d9ee434881dea4fe3f703bf891ac60c94bc60cb760a4d8be.png ../_images/dbc4bbe53c71cf8626132a1e62215b92cdb9ea190449fcf2841866c7e2c494f5.png

Predictions in samples for which no WHO 22 Dx data was available#

Hide code cell source
draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title='Discovery cohort', fig_size=(4, 9),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_cog, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 8),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Validation cohort',fig_size=(4, 8),
                 fontsize=8, nan_action='keep only')
Hide code cell output
../_images/455b40360c68e2fc4b47434036daa80589a18bb619ca15575d4b5067fabbbed6.png ../_images/804b947d9fcd318b8fdb0a2ee044540211360f07c2fd7bfff7304b4cd4dfedf6.png ../_images/65a040639e0580e4ccd3f02e94c8d4ab56e651a40d86c8407b57d57e5d96ad59.png

Reason for unclassified samples#

Hide code cell source
draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'Primary Cytogenetic Code', colors,
                 title='Discovery cohort', fig_size=(4, 6),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_cog, 'WHO 2022 Diagnosis', 'Gene Fusion', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 9),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'Primary Cytogenetic Code', colors,
                 title= 'Validation cohort',fig_size=(2, 3),
                 fontsize=8, nan_action='keep only')
Hide code cell output
../_images/584ba3f4b41a1e45c1c7bb6a345f01ccb03d7074e703e429f061e3bb1cd71c4b.png ../_images/c01657bfa790f28d92f1762e7e9746027b933899c8cc0a47a4ec39c98997bf16.png ../_images/9ba84bfe4896f295efbab9b157fb4b0079d165caf9010d92affdcd654ee5a0bd.png

Risk group comparison in COG#

Hide code cell source
draw_sankey_plot(df_cog, 'Risk Group', 'Risk Group AAML1831', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(2, 4),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_cog, 'Risk Group AAML1831', 'AML Epigenomic Risk', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(2, 4),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/f83982441470711056fc53790dbdbb61af52b56f33d6c14a949bf7d6d755015a.png ../_images/754dc6fd6244f0c1468a783e9cd96c701a0ae000aa2fde11391a7281ab75de96.png

Px and Dx model comparison#

Hide code cell source
draw_sankey_plot(df_train, 'AML Epigenomic Risk', 'AL Epigenomic Subtype', colors,
                 title='Discovery cohort', fig_size=(3, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_cog, 'AML Epigenomic Risk', 'AL Epigenomic Subtype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(3, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_test, 'AML Epigenomic Risk', 'AL Epigenomic Subtype', colors,
                 title= 'Validation cohort',fig_size=(3, 8),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/8743980551bab510233c3f4c8a5c100043b908fc285b76ebf684d60f46c8f792.png ../_images/369c000ee144a1d1a8b26eccb02fa14d0e585d999c8d0ee8e54c18b4da917d8c.png ../_images/4435a62f0255b8f9c8f0c1bed9056a2d9dfd6ae880fe5646e4146be999092291.png

Watermark#

Author: Francisco_Marchi@Lamba_Lab_UF

Python implementation: CPython
Python version       : 3.10.11
IPython version      : 8.20.0

pandas         : 2.2.0
seaborn        : 0.13.2
matplotlib     : 3.8.2
tableone       : 0.8.0
sklearn        : 1.4.0
lifelines      : 0.28.0
statannotations: not installed

Compiler    : GCC 11.3.0
OS          : Linux
Release     : 5.15.133.1-microsoft-standard-WSL2
Machine     : x86_64
Processor   : x86_64
CPU cores   : 20
Architecture: 64bit